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few years ago, researchers started new projects by requesting biblio- 
graphic searches, contacting fellow researchers, and leafing through 
conference proceedings and university technical report lists. Today, the 
Internet can serve all these functions and serve them well, but trying to locate data 
in this sea of information can be a considerable task. 

Resource discovery services now help users locate and retrieve information. 
These services contain tools for browsing, searching, and organizing information 
distributed throughout the Internet. Browsing tools let users navigate the informa- 
tion space to find the specific data they need. Indexing search tools automatically 
locate relevant data on the tfasjs of user interest. Independent of the approach 
used, resource discovery services can also help users organize newfound informa- 
tion. so ihat they can refer to it without haying to repeat the entire discovery 
process: ' "-' * ^ 

We present an overview of the resource discovery services currently available on 
the Internet. Because resource discovery has been the subject of intense research, 
this article is hot meant to be exhaustive. 

The \(Vrd<? Area Information Servers project 

Dow? Jones and Company; -Thinking- Machines Corp., Apple Computer, and 
KPMG Peat Marwick developed the WAiS project' because they were interested 
in the information discovery-arid-retrieval problem. The project seeks to provide 
users with a uniform, easy-to-use, location-transparent mechanism to access 
information. 

WAIS is a full-text information-retrieval architecture whose clients and servers 
communicate through an extension of. the Z39.50 protocol standard from the 
National Information Standards Organization. 

WAIS architecture* Figure l is a schematic view of the architecture. WAIS 
Clients translate user queries into the WAIS protocol and query the Directory of 
Services for relevant databases. W.AI§ then transmits the request to a selected set 
of databases over the communications network. Database servers keep complete 
inverted indexes on stored document contents and execute full-text searches on 
them.Jn response to a query, a server returns a list of relevant object descriptors. 
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These descriptors correspond to docu- 
ments that contain words specified in 
the user query. The client displays que- 
ry results and can retrieve documents 
from corresponding servers. Clients also 
display a numerical score for each hit. 
The score relates to the frequency of 
query-specified words in the object's 
contents and provides feedback to help 
users refine their queries. 

WAIS databases. One server indexes 
the descriptors of the other servers. This 
yellow page (as in the telephone book) 
service resides at a well-known address, 
and users query it in the same way as 
they do other servers. Instead of refer- 
encing documents, however, it refer- 
ences participating databases. When a 
new information provider wants to join 
WAIS. it must submit its location, de- 
scription, and other relevant informa- 
tion to the directory server. The WAIS 
Directory of Services currently contains 
around 300 registered databases. The 
WAIS directory server's database is 
replicated on a number of servers and 
has its primary copy stored on host 
think.com. 

Discovery session. Figure 2 illustrates 
a WAIS discovery session running on 
an X Windows client. The query first 
arrives at the Directory of Services (see 
the "In Sources" box in the upper win- 
dow), which responds with descriptors 
of pertinent servers. The user can then 
select servers from this set and submit 
the query to them (see the "Resulting 
documents" area in the upper window). 
In this example, two servers are select- 
ed and queried (see the "In Sources" 
box in the lower window). They return 
a set of document descriptors from which 
the user can identify relevant documents 
to be retrieved. To refine the search, 
users can select one or more of the 
returned documents and place them in 
the "Similar to n box. When the query is 
rerun, WAIS updates the results to in- 
clude similar documents. It ranks simi- 
larity in terms of how many words match . 

Archie 

The Archie 2 service addresses the prob- 
lem of locating files by attribute (files are 
currently listed only by their names) in 
the Internet. Archie was developed at 
McGill University by Alan Emtage and 
Peter Deutsch . Archie servers centralize 
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Figure 2. An example WAIS discovery session. 
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Figure 3. The Archie architecture. 



indexing information on file-name data 
distributed throughout thousands of In- 
ternet public archive sites. 

Archie databases. Archie currently 
offers two databases: Filenames and 
Whatis. The Filenames database index- 
es the names of files available from 
hundreds of Internet file-transfer pro- 
tocol (FTP) sites. FTP lets users re- 
trieve files stored on Internet hosts. 
Anonymous FTP, which does not re- 
quire the user to have an account on the 
FTP site, is widely used for distributing 
free software and documents. Archie 
automatically updates entries in the 
Filenames database. Users can query 
this database for file names that match 

• specified patterns, 

• a list of FTP archive sites, or 

• a list of the files available from spe- 
cific sites. 

The Whatis database contains the 
names and descriptions of software pack- 
ages, documents, and other information 
available on the Internet. Entries include 
text strings consisting of keywords and 
associated descriptions. Users can per- 
form case-insensitive text-string search- 
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es that are applied to both keywords and 
corresponding descriptions. 

The ;Whatis database is manually; 
maintained by the system/database ad- 
ministrator. Information is gathered, 
from secondary sources (such as Usenet 
postings and an author's e-mail submis- 
sion) and entered' into the database by] 
hand. In contrast, data-gathering and, 
data-maintenance components maintain 
the Filenames database. 

Archie architecture. Archie clients 
access both databases through the user- 
access component (UAC). The data- 
gathering component (DGC) relies on 
FTP site administrators to find out about, 
new FTP archives. Every time a new 
FTP archive is reported, an entry corre- 
sponding to the new site is added to the; 
site descriptions file, which lists all known! 
FTP sites. Peribdicaltyrthe t>GC con- ; 
necls.ioeach known FTP site, and fetch-, 
es a recursi ve listing of its contents. This 
information remains on the Archie serv- 
er in the raw listing files until it is pro- 
cessed by the data-maintenance com- 
ponent. ..(DMC), which converts the 
listings in^to a format that can be a v dde,d 
to the Filenames database. figure 3pre- 
sents the Archie architecture. 



The UAC lets Internet users access 
and query Archie servers. Currently, 
access is possible via electronic mail. 
Telnet, and Prospero. 5 E-mail users can 
submit a query by sending a message to 
an Archie server, which sends a mes- 
sage back to the user with the query 
results. Telnet users connecting to an 
Archie server through the telnet com- 
mand can submit queries to both data* 
; bases. Eacr\ telnet session requires a 
^ significant amount of server resources. 
' Fof'rms reason, Archie currently uses 
; Prospero as* a front end for each Archie 
server. The Prospero server lets users 
' 'access Archie databases without log- 
ging directly on to the Archie server. 
The Prospero interface also lets Archie 
clients use the Prospero protocol. One 
such client is Xarchie, which provides a 
point -and-click interface to Archie. In 
the commercial version Archie 3.0, all 
available interfaces are clients of the 
Prospero interface, which communicates 
with Archie servers that have Prospero 
front ends. 

Archie session. Figure 4 shows an 
Archie session using the Xarchie client. 
In the upper window, the user specifies 
a pattern to be matched. Archie then 
returns a list of archival sites whose 
names match the pattern. As illustrated 
'•by^the bottom window, when the user 
selects a specific site, the file's complete 
path name, size, access permissions, and 
date of last modification are also dis- 
played (see the lower portion of the 
bottom s window). Users can then re- 
quest that the .file be "FTP'ed** to their 
locai site. 

Traffic. Archie consists of a dozen 
servers around the Internet. Originally, 
Archie servers kept their databases con- 
sistent by copyingthe site-listing infor- 
mation from the main server in Mont- 
real, Canada. According to Emtage and 
peutsch, 2 the update mechanism and 
user queries to the main Archie server 
generated approximately 50 percent of 
all Montreal-bound Internet traffic. Con- 
sequently, the task of polling the partic- 
ipating HTf sites is currently distributed 
among various Archie servers, which 
execute a flooding-based consistency- 
maintenance protocol among themselves. 

Archie's developers have described 
it as a low-tech solution to the resource 
discovery and information-retrieval 
problem, Archie's simplicity and use of 
existing mechanisms have been key to 
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its success. After approximately two 
years in service, with over 1,000 regis- 
tered FTP archive sites that offer some 
2,100,000 files and 3,500 software pack- 
ages, Archie is accessed from 47 coun- 
tries about 50,000 times per day. 



Prospero 



The Prospero File System,* a tool for 
organizing distributed information, was 
originally developed at the University 
of Washington by Clifford Neuman. It 
lets users build customized "views," or 
virtual systems, of directories located 
throughout the Internet. 

Organization. The Prospero name 
space forms a generalized directed graph , 
in which intermediate nodes are direc- 
tories, leaves are files, and edges are 
Prospero links. Just like traditional dis- 
tributed file systems, subtrees of the 
Prospero name space can be stored on 
different Prospero servers. A user's 
name space corresponds to the subgraph 
starting at a particular node, which serves 
as the root of the user's name space. 

Users organize their name space by 
building views, which are essentially 
directories composed from various 
sources that include the user's own views 
and imported views from other users. 
These directories can reside on differ- 
ent Prospero servers. 

In Prospero, an index is a special type 
of view that returns a directory of ob- 
jects that satisfy some query. It allows 
Prospero users to access other search 
engines. For example, the Prospero- 
Archie interface lets users build views 
containing directories with objects re- 
sulting from Archie queries. 

Users find information by navigating 
through available views. The Prospero 
client provides users with navigational 
tools analogous to the ones provided by 
traditional file systems. One command 
allows users to change the current virtu- 
• al directory to the one specified in the 
command. Another command displays 
the name of the current virtual directo- 
ry and describes its physical location. 
Users can also list the contents of a 
virtual directory. 

Prospero session. Figure 5 on the 
next page shows a sample session that 
illustrates the Prospero-Archie interface. 
Starting at the root of the Prospero name 
space, the user can then change the cur- 
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Figure 4- An example Archie session using Xarchie. 



rent virtual directory, .display its narhe, 
and list its contents by. using the cd, pwd, 
and ls,commands". To submit an Archie 
request to find all file names that match 
the s.tring M wais,"\ the user changes the 
directory to /Archie/wais and lists the re- 
sults by using the Is command, v ^ 

Links. When users find an interesting 
object, they can include it in their view 
by linking to it.; The Prospero client 
provides commands to add and delete 
links from the current node in the user's 
name space to or from a target node or 
leaf. A Prospero link specifies the name 
of the host where the object is stored 
and the local name of the object on that 
host. If the link's target is a directory, 
the link provides information to resolve 
a name in that directory by querying 



-th ; e .corresponding server. For files, 
associated links contact the appropri- 
ate server to provide access-mode in- 
formation/ Prospero supports Sun*s 
Network File System, the Andrew File 
System, anonymous FTP, Gopher, and 
WAIS. 

A link also provides information such 
as its type and associated filters. Special 
links^allow the contents of the target 
directory to be included in the directory 
containing the link. By associating fil- 
ters with links, users can build custom- 
ized views from existing ones. A filter 
customizes the target view by reorga- 
nizing or extracting parts of it. Listing a 
Prospero view requires a computation 
distributed across all nodes reachable 
by transitive closure of all the view's 
links, indexes, and filters. 
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% cd/ 

%ls 
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Figure 5. A sample Prospero session. 



Virtual system registration. Users 
"advertise" information by registering 
their virtual systems with the Prospero 
server administrator. The administra- 
tor creates a link to the new virtual 
system in the master view of virtual 
systems so that other users can see, nav- 
igate, and link to accessible portions of it. 

Usage. Currently, there are close to 
50 Prospero servers. Most users (from 
more than 10,000 systems in 30 different 
countries) run as Prospero clients. 

Gopher 

Gopher/ developed at the University 
of Minnesota, lets users search and 
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browse distributed information on the 
Internet. Gopher organizes information, 
into a hierarchy in which intermediate 
nodes are directories, or indexes, and 
leaf nodes are documents. (Actually, 
the Gopher information space is a di- 
rected ^raph, since it allows : cycles.) 
Users navigate the Gopher information 
space guided by the Gopher client's hi- 
erarchical menu system. 

Gopher architecture. The architecture,' 
as shewn in Figure 6, consists of clients 
and servers communicating through the 
Gopher protocol, which is implemented 
on top of TCP-IP (transmission-control 
protocol- Internet protocol). i 

The root of Gopher's hierarchy re- 
sides on host rawBits.-micro.umn.edu 



at the University of Minnesota. This is 
the* default directory retrieved by Go- 
pher clients when they are first invoked. 
Clients can also be configured with oth- 
er entry points into the hierarchy. The 
Gopher root server has knowledge of 
all top-level services and advertises their 
existence to users. The architecture con- 
tains essentially one top-level server per 
participating organization, such as uni- 
versity campuses, private corporate in- 
stitutions, or government agencies. Low- 
er level servers can be linked to the 
corresponding top-level server, so that 
once users rind the appropriate top-level 
server, they can navigate through the 
hierarchy by following the links to the 
lower level servers. 

For example, university campuses 
running Gopher servers may register a 
central top-level server .with the root 
server. Each university's central Go- 
pher server can link to existing depart- 
mental servers, which in turn can link to 
lower level servers. 

Gopher objects are identified by type, 
user-visible name, server's host name 
and port number, and the object's abso- 
lute path name within the server file 
system. The user selects an object on 
the basis of its user-visible name, and 
the client retrieves it by constructing a 
"handle" from the server's host name, 
its port number, and the object's path 
name. Users can then navigate through 
the available information space that con- 
tains full-text document objects, which 
are stored as 

• files in. the corresponding servers, 

• directory objects that can be distrib- 
. ujt&d across multiple servers. 

Full-text search operations can also 
be performed: Gopher's search servers 
maintain full-text inverted indexes of 
subsets of the documents stored in a 
Gopher server. Search servers can be 
configured to index more than one 
server. For instance, an Internet re- 
quest fpr comments (RFC) full- text 
search server indexes all existing RFCs 
and executes keyword searches on their 
co.n tents. A full-text search server re- 
turns to the client handles to docu- 
ments that, match a Boolean search 
pattern. Gopher clients can also re- 
trieve objects from WAIS, Archie, and 
ITT P, servers. 

Gopher session. Figure 7 illustrates a 
COMPUTER 
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sample Gopher session. Starting at the 
root directory, the user traverses the 
Gopher information space by selecting 
interesting directories, such as Librar- 
ies/ and Library of Congress Records/, 
or executing full-text searches on in- 
dexes like "search Library of Congress 
records from 12-91 to present." 

World-Wide Web 

The World-Wide Web/ developed at 
CERN in Switzerland, merges informa- 
tion discovery and hypertext techniques. 
It organizes data into a distributed hy- 
pertext in which nodes are either full- 
text objects, directory objects called cov- 
er pages, or indexes. WWW also supports 
full-text searches over documents stored 
at a particular server. 

WWW architecture. The architecture 
is based on the client-server model, with 
WWW clients providing users with a 
hypertext-like browsing interface. Be- 
sides its native hypertext transfer pro- 
tocol (HTTP), WWW clients understand 
FTP and the network news transfer pro- 
tocol (NNTP). FTP lets users access file 
archives on the Internet, where file di- 
rectories are browsed as hypertext ob- 
jects. NNTP allows access to Internet 
news groups and news articles. News 
articles often contain references to oth- 
er articles or news groups, which are 
represented as hypertext links. 

HTTP allows document retrieval and 
full-text search operations. It runs on top 
of TCP and maps each request to a TCP 
connection. HTTP objects are identified 
by the HTTP protocol type, the corre- 
sponding server's name, and the path 
name to the file where the objects* con- 
tents reside. Parts of documents can also 
be specified. If a search operation is re- 
quested, the HTTP object identifier car- 
ries the set of specified keywords instead 
of a path name. Future implementations 
of the HTTP protocol will include data- 
format negotiation between client and 
server. Currently, only plain text and 
simple hypertext formats (Hypertext 
Mark-up Language, or HTML) are im- 
plemented. 

Discovery trees. Information accessi- 
ble through WWW can be seen through 
three discovery trees. The first one clas- 
sifies information by subject. An entry 
in the WWW root directory links the 
directory to the current subject-classifi- 
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Figure 6. The Internet Gopher architecture. 



cation tree. This classification includes 
topics such as aeronautics, astronomy, 
biological sciences, computer sciences, 
and humanities. This information is 
spread across all kinds of. servers, in- 
cluding WAIS, Gopher, and WWW. 
Because WWW was created for the high- 
energy physics community, a special 
entry in the, root directory links to a 
cover page that contains the. existing 
HTTP servers specializing in the sub- 
ject/As they become available, indexes 
to other^disciplines are added by the robt 
directory's database administrator. ; 

The second W^yWdiscoyery.treeclas- 
sifipi by server type. The cover page 
corresponding to this classification lists 
all servers available through WWW. This 
includes^ entries for WAIS, Gopher, 
NNTP, and WWW servers. There is even 
an entry for anonymous FTP sites that 
can be searched through Archie. The 
third-tree, which classifies by organiza- 
tion;' is not Very*popufated.ih rhe- sense- ; 
that it does not contain a great deal of 
information. 

Personalized web. The WWW dis- 
covery trees corresponds the different 
ways information is organized and can 



be discovered. Discovery sessions in- 
volve users starting on their home cover 
page, following a link to an index, exe- 
cuting a search, and following the re- 
sulting links. As users find interesting 
information, they build their personal- 
ized web by linking to nodes in the 
, global-web. Currently, the default cover 
page ^whicK resides on host info.cern.cn 
and irepresenfsTtne root of the WWW 
information;sp£ce, is the one retrieved 
by the WWWjblfent when it is invoked. 
However, users can customize their 
hon^e^oVer page so that they can start 
anywhere in the WWW information 
: space. Figure 8 presents an example 
session. 

Publishing information. Making in- 
formation available through WWW can 
involve a similar discovery task. The 
information publisher tries to find the 
appropriate cover page to reference the 
• new data. Then the publisher contacts 
the person responsible for that cover 
page, who adds a link to the new data. 
The publisher can also run a new server, 
which requires the new server's admin- 
istrator to contact WWW administra- 
tors to add the new server to the list. 
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Servers and hosts. Currently, besides 
WAIS and Gopher servers, there are 
about 24 WWW servers accessible to 
WWW clients. The WWW server on 
infoxern.ch has logged access from ap- 
proximately 6,000 hosts that use their 
own WWW clients or connect to the 
publicly available client. , 

Resource discovery at 
the University of 
Colorado 

The Resource Discovery project at 
the University of Colorado-Boulder is 
best known for Netfind 6 but has also 
investigated various issues in the re- 
source discovery arena. 



Netfind. This white pages directory 
tool tries to locate information about an 
Internet user when given the user's name 
and organization. A successful Netfind 
query, like the one shown in Figure '9, 
returns information such as the user's e- * 
mail address and telephone number. , . 

Netfind builds its indexing database, 
called the seed database, by using data 
scattered across multiple existing sourc- 
es, $uch as network news messages; the ■ 
domain naming system (D^S]u the sim- 
ple mail,- transfer protocol ^SMTP), and 
the finger 'utility.'. The; $eed .database 
keeps, organization names, city names! 
and corresponding host names gathered 
from news message headers. On the 
basis of orgamzatiojv names and .ci t ty 
names provided to Netfind, matching 
host names are selected' from thVseed 



database. DNS is then used to locate 
authoritative name servers for the 
domains to which the selected hosts be- 
long. Each -name server located is que- 
ried by using* SMTP to find mai 1- for- 
ward! ng information about the specified 
user. If found, the corresponding hosts 
are probed by using the Unix finger 
utility. Tq improve Netfind 's response 
time and increase its resiliency to host 
and network failures, up to 10 threads 
can be used to allow sets of DNS, SMTP, 
and finger-query sequences to be exe- 
cuted in parallel. 

Other projects. Other resource dis- 
covery-related projects under develop- 
ment at the University of Colorado in- 
clude' a network visualization tool that 
focuses on discovering information 
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PVe%e' <RETURN> to continue, ' <m> to mfall. 



<s> to save, or <p> to pr. 



Figure 7. A sample Gopher session. 
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COW IHFOWATION 

HelpM Oh this prop*, or the UorlHide Ueh(2] . - 

Phone booktS People, phone runbers, accounts ancf e»ail addresses. 

See alto the analytical Yellow Pagest4J , *r the im 
Index in French : Pages JaunesC5) 

Computer centerfM Index of computer centre docwentation, newsletters, 
news, help files, etc. See also WS Helpt7) on VKS, 
cossUlers, languages etc* 

fewstH A cmplete list of til public CERH news grpups, such 

«s news frw the COW User's OfficeEfl , t0$ cc*pot*r' 
center newstlOl , student newsCU} . Set also privite 
groupstia and Internet news (13] . , 

See also Ugh-energy Ph»sies(i43, Types of semrCiS), and " 

(mO SUBJECTStlfe) 

i&N " 

(This page nay be an out of date copy. See the latest verstonti7lj'.> ' 
•17, Quit, or Help; 



CERN STIHC information tcrvicc 
UELCOC TO THE SOFTWARE TEWWLOCr* IhTEREST GROUP . 

This is an experimental infomation service - your cements are welcone tt). 

>neral infonution on STING (2) 
*m to |et the STING news iJ) 

You can search for uterial on topics that interest you: 

in the Software Engineering glossary M), 
in the database of STING news itens [H, 
v in an archive of the coup. software-en* Usenet newsgroup 16). 

Alternatively, you can iwke a general search of STIHG infonution sources, 
by typing in your towards now (sate help with ke*mrds is available C7!>: * 



[End] 



MSB CF MODOC INTONATION 

Infonution categorised by subject. See also by organization^) . 
pro$oeol[2). , and ccmerciaUS online data. Hail u if yoo know of online 
infonution not in.these- lists...;- . 

Aeronautics ' ' Hailing list archive IndexM. 

totrcnony .. A saple collection of astronaUcal inagesIS , (Al 

available in GIFCU fomat); Not yet brousable 
directly using U3. 

■Bio-Scitncet - See separate listtH ; ■ 

Computing . .Sec Hetworkingtffl. , Jargon£9), newsfroupsCIO] , 

Software Techno! ogytiU , Languages(12] , 
AlgorithwUXJ. V; 



Geography 

1-40, Back, OWN* for wore. Quit, or Help: ll| 



aft world Fact Bocktlfl, India: KisceUaneous 
infomationtlSl , Thai -Yunnan: Davis collection!!*! 



m <ketf«rds>, 1-7. Back, Quit, or Help: 4 



• ■■ Introduction to the software engineering glossary 

An experimental version of the software engineering glossary is now 
available: thanks to all who contributed CD. Ue have Mlnly tried to cover 
software engineering terns and abbreviations, plus sow related topics that 
nay be useful. Sorry if sow things that interest you are not included. 
Cements and suggestions (better still corrections and contributions!) are 
very welcone. Send then to us by e-nall 12) 

You can. find an entry by typing it as a kegord now. You can then follow 
*U*rtext links to other it«s, or type another keponl at any tine. 



(End! 



FIKJ Oteiaiofds>,.t-2. Back, Quit, or Help: hyperUxtQ 



Figure 8. A sample WWW session. 



about networks. Items include topolo- 
gy, congestion, routing, and protocol 
usage. A global e-mail study focuses on 
providing ways of locating users with 
particular interests or expertise on the 
basis of e-mail pattern analysis. 

X.500 

The X.500 directory 7 resulted from 
standardization efforts in the field of 
directory services by the CCITT (Inter- 
national Consultative Committee for 
Telephony and Telegraphy) and the ISO 
(International Standards Organization). 

Attribute queries. Unlike the domain- 
naming system, which basically maps 



host names into corresponding Internet 
addresses A and vice yersa>X.500 entries 
consistjffjf a set of attribute-value pairs. 
X.500 'accepts attribute-based^queries. 
The &if£ctory *s na me space, is hierarch i- 
cally organized and distributed among 
its servers. Administrative authority over 
portions of the global namie space is 
delegated to different autonomous 'or- v ' 
ganizations, which' can transfer author- 
ity over portions of their assigned sub- 
trees. As in DNS, portions of X;500?s 
name space are replicated on different 
servers, which use a simple replication 
mechanism based on designated slave 
and master servers. Figure lO illustrates 
a sample X.SOO session. 

Internet offers numerous freely dis- 
tributable X.500 software packages. 



Some of the X Window System interfac- 
es* ar^ Bellcore's Xdi, Csiro's xdua, and 
the University of Wisconsin's xwp. An 
X.500-specific. WAIS database called 
x.500. working-group.src provides more 
information on X.500 software and doc- 
umentation. 



• Distributed Indexing, or Indie for 
short,* is a distributed information dis- 
covery-and-retrieval architecture we 
developed at the University of South- 
ern California. Indie consists of a repli* 
cated directory of services and a collec- 
tion of broker databases. Brokers 
automatically cluster references to re- 
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lated information by indexing their own 
data, as well as data stored in other 
brokers, databases, and other discovery 
tools. As in Archie, this clustering of 
indexing information lets users efficient- 
ly search all participating databases. In 
addition, because it was built on top of 
the Distributed Hypertext (DHT) sys- 
tem data model and communication pro- 
tocol, Indie inherits the organizational 
capabilities of hypertext systems. 



caldera.usc.edu% telnet bnino.cs.cok>rado.edu 
login: netfind 



Figure 9. A sample Netflnd session. 
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Object descriptors. An Indie broker 
stores descriptors of objects relevant to 
the topic in which it specializes. It ex- 
tracts them, from other Indie brokers 
and primary data sources that it index- 
es. Aiv£object .descriptor contains an 
arbitrary number of attribute-value pairs 
descrityrtg'the object. Examples of at : 
tributes that constitute an object ;de« 
scripfor include bibliographic informa- 
tion About the object such as the name 



of the author(s), the document title, 
and publication date. 
'* The object descriptor includes tech- 
nical.data.abQut the object, such as ob- 
ject type, object identifier, and the time 
stamp assigned to each object by the 
database that created it. It also includes 
some attributes used by Indie's consis- 
tency [maintenance mechanism. The 
object descriptors need not include the 
object itself . This lets a service advertise 
an object while retaining access control. 

. . Generator objects. A generator ob- 
ject describes an Indie broker. The gen- 
erator consists of a textual abstract and 
a Boolean expression over a set of bib- 
liographic, fields (we call the Boolean 
expression the generator rule). It also 
contains fieJds such as the broker's lo- 
cation, the size of its database, and the 
object identifier corresponding to the 
generator object. The name-generator 
rule expresses the idea that a broker's 
database is generated and periodically 
updated by evaluating the rule over a 
number of other brokers and primary 
data sources. 

, To become visible to users and other 
brokers, ail brokers register their gen- 
erator, object with Indie's directory of 
services. A broker can register itself 
with any replica of the directory of ser- 
vices. As part of the registration proce- 
dure, the selected replica returns a list 
of other generator objects pertinent to 
the new broker. The broker stores this 
list in its registration table and refers to 
it when choosing the set of databases to 
index. The directory of services replica 
also reports changes to the broker's 
registration table as new brokers join 
in or participating brokers cease to exist, 
prqrn time! tb= time, broker administra- 
tqrs.use the administrator's tool to re- 
fer to^the.. corresponding registration 
table and decide whether to index new 
brokers. 

Brokers indexed by other brokers 
store the indexing, brokers* generator 
objects in their trigger table. The name 
"trigger table" refers to active databas- 
es in which specific rules are triggered 
and evaluated when the database chang- 
es in. specific .ways. When one broker 
registers ife;- generator object with an- 
other broker, jtlie indexed broker exe- 
cutes the generator rule and reliably 
forwards lhe t retrieved set of object de- 
scriptors to therindexing broker. After- 
wards, adding or deleting objects from 
the indexed broker's database can trig- 

COMPUTER 



Welcome to the University of Colorado Netfind server. = ■'. 



Alternate Netfind servers: 

archie.au (AARNet, Melbourne. Australia) 
bnino.cs.colorado.edu (University of Colorado. Boulder) 

su.uakom.es (Slovak Academy of Sciences, Czech and Slovak Fed. Repub.) 

Top level choices: 

1. Help O 

2. Search \";> 

3. Seed database lookup . 1*; 

4. Options 

5. Quit (exit server) 

->2 /• " ' 

Enter person and keys (blank to exit) — > obraczka use 

There are too many domains in the list. Vr':: ' ' ; ; 

Please select at most 3 of the following: ' ' \. ' ' ' ' 

0. hsc.usc.edu (university of southern California, los artgcles) ' • : J 

1. hsc.usc.ed.ln.net (los nettos. use information sciences institute, marina del rey. California) 

2. In.net (los nettos, use information sciences institute, marina del rey ? California) ..... . 

3. usc.vcu.edu (Virginia common wealth university, richmond) 

4. usc.edu (university of southern California, los angelesV'^ 

5. usc.co.jp (use corporation, yokohama. japan) . .. 

6. usc.ed.ln.net (los nettos. use information sciences institute, marina del rey. California) 

7. usc.cln.net (los nettos, use information sciences institute, marina del rey. California) 

8. usc.es (unspecified) 
Enter selection (e.g., 3 I 2) — > 4 

( 0) check_name: checking domain usc.edu. Level = 0 

MAIL IS FORWARDED TO obraczka@chaph.usc.edu 

NOTE: this is a domain mail forwarding arrangement - so mail intended 

for "obraczka** should be sent to "obraczka^usc.ed'j" * 

rather than 41 obraczka@chaph.usc.edu". 

( 0) check_name: checking host chaph.usc.edu. Level = 0 . v 
SYSTEM: chaph.usc.edu 

Login name: obraczka In real life: Katia Obraczka 

Directory: rtiome/chaph2/obraczka Shell: /usr/local/etc/notallowed 

Last login Sun Apr 7. 1991 on ttypl from jcrico 

No unread mail 

No Plan. 

( 0) Attempting finger to current indication of most recent "Last login" machine jerico.usc.edu 
( 0) check_name: checking host jerico.usc.edu. Level = 1 

SUMMARY: 

- Among the machines searched, the machine from which user 

"obraczka" logged in most recently was jerico.usc.edu. . 

on Sun Apr 7, 1991. Mr 
-The most promising email address for "obraczka** 
based on the above search is 
obraczka@usc.edu. 
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ger the evaluation of generator rules in 
its trigger table, causing these changes 
to be forwarded to the corresponding 
indexing brokers. 

The directory of services is simply a 
specialized broker. When a broker reg- 
isters itself with a replica of the directo- 
ry of services, the broker's generator 
object is stored in that replica's trigger 
table. Only the updates to the directory 
of services that trigger this rule are for- 
warded to the broker. 

Gateways* Non-Indie servers attach 
to Indie through a gateway broker. In- 
die provides a gateway library consist- 
ing of a set of routines which non-Indie 
servers can call to communicate with 
their gateway broker. The gateway bro- 
ker itself is just a normal Indie broker 
modified to communicate with non-In- 
die servers. It manages a trigger table, a 
registration table, and the interface to 
the directory of servers. A non-Indie 
server can efficiently attach itself to In- 
die by making appropriate calls to the 
gateway library. Otherwise, the gate- 
way broker must communicate with the 
non-Indie server in its native protocol 
and poll for updates by periodically ex- 
tracting indexing information from the 
server. 

Indie architecture. Figure 1 1 illustrates 
an example indexing configuration and 
its operation. Suppose that a user wants 
to find all recent technical publications 
on distributed operating systems. The 
user interface submits the correspond- 
ing user query to a replica of the direc- 
tory of services, which evaluates the 
query against its generator object data- 
base. This computation produces a list 
of brokers whose descriptions are perti- 
nent to the user query. In this example, 
the directory of services could return a 
reference to the operating systems bro- 
ker and the distributed systems broker. 
The user interface ranks the list of tar- 
get brokers according to interest. This 
ranking procedure could be based on 
counting the number of keyword match- 
es in the broker descriptor. In this case, 
the user interface could send the query 
to both brokers, who — after evaluating 
the user query on their indexing infor- 
mation database — return a list of ap- 
propriate object descriptors. On the basis 
of this information, the user can choose 
to retrieve a copy of one or more inter- 
esting articles from one or more full- 
text retrieval systems. 



For instance, a- user decides to re- 
trieve a copy of an interesting report 
from the University of California at Los 
Angeles. The user interface contacts 
the corresponding, gateway to retrieve 
the selected object. The user can also 
permanently link any node in the user 
information space to the UCLA techni- 
cal report for future use. 

Consistency mechanism. Indie ad- 
dresses database consistency and re- 
covery with' a time-stamped augmented 
flooding algorithm that also permits con- 



venient recovery from network parti- 
tion, operating system crashes, and 
media failure of the broker's database. 
Indie's time stamp-based consistency 
mechanism works as follows. All trigger 
table entries and registration table en- 
'ttfesare time-stamped. When an object 
is added topr deleted from a broker, the 
change causes' the broker to evaluate 
certain rules stored in its trigger table 
against updates with time stamps earli- 
er than the trigger table entry. The bro- 
ker forwards the appropriate changes 
to each of its affected peers (that is. 



Welcome to Dish (Directory SHell) < .Y^ ' & " : ,; 

Dish -> squid -* ■ y;?-;":\ V^i^l* 

Connected to Incan Speckled Iguaria at t OH>rH/Internet= 
128:129.32.31+17003 

Current position: ^=US^t=California@o=Information Sciences 

Institute 

User name: @ 

Current sequence: default 

bish-> list 

1 organizationalUnitName=Business Office 

2 organizationalUnitName=HPCC 

3 organizationalUnitName=Info Processing;CerH;er. 

4 6rgani^tionalUnitName=Ihtegrated Systems " " ' 

5 organizationalUnitName=Intelligent Systems ' 

6 organi2ationalUnitName=Siiic6n Systerhs ; 

7 organ izationalUnitName=Software Sciences 

8 commonName=Manager 

9 commonName=Postmaster 

Dish -> moveto 2 ^ 

Dish -> search Hotz 1 

objectClass - organizationalPerson & pilotObject & newPilotPerson 
& quipuObject 

commonName- Steven Hotz cv r v l-'H ' r * 

commonName - hotz 

surname - Hotz , . " 

pp§ tal Address - Information Sciences Institute ; . . 
SuUe,100L % v r* ' * 

4676, Admiralty Way . . . .. , 

Marina del Rey^ CA : . . _ 

90292 

telephoneNumber - +1 310-822-1511 x402 
facsirnileTelephoneNumber - +1 310-823-6714 
user\d - hotz 

rfc822Mailbox - hotz@venera.isi.edu 
otherMailbox - internet: hotz@venera.isi.edu 

Dish -> quit .„.-,",...,.,:. •. • ';^X' 



Figure 10. A sample X*500 session. 
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each registered indexing broker corre- 
sponding to the triggered generator 
rules) and advances the trigger table 
time stamps to the time stamp of the 
latest update. It then transmits this time 
stamp to all affected peers, and they 
record it in the appropriate registration 
table entry. When the indexed broker 
cannot establish communication with a 
peer, it marks the trigger table entry as 
out of date. Time stamps of rules nei- 
ther triggered nor out of date are ad- 
vanced to the current time. Finally, peers 
occasionally poll one another in an at- 
tempt to maintain consistency. 

Replication. Indie's indexing mecha- 
nism causes "lazily consistent" broker 
replication as a side effect. To replicate 
a broker, we create a new broker to 
serve as the replica, assign the replica 
the same generator rule as the broker to 
be replicated, and have the replica in- 
dex the broker or some of its other 



replicas. Since the replica shares the 
same generator rule as the primary copy, 
it fills with the same data. Indie's up- 
date-and-recovery algorithm guarantees 
that all replicas eventually learn of the 
update. 

Replication in the context of the di- 
rectory of services considers all replicas 
equal -Thismeans that there is no prima- 
ry copy' bf the directory of servers. In-. 
stead;clients register or unregister them- 
selves with the replica of their . choice. 
All replicas participate in a flooding- 
based consistency maintenance mecha- 
nism to keep their databases consistent. 

Status* We have completed Indie's 
first implementation phase. Currently; 
the prototype consisting of Indie bro- 
kers, a.centralized directory of services, 
and gateways to FTP archives runs on 
the University of Southern California's 
Network and Distributed Systems Lab- 
oratory. We are now implementing In- 



die's consistency maintenance and rep- 
lication mechanisms. 



Other research 
initiatives 



•Other discovery tools include the 
r Knowbot Information Service,* Alex, 10 
Semantic File Systems, 11 and Nomen- 
clatpry 2 >i: 

Knowbot Information Service. The 
Digital Library System (DLS), proposed 
by the Corporation of National Re* 
search Initiatives, is an open architec- 
ture whose goal is to integrate access to 
existing and future information sources 
on the Internet. Knowbots, an abbrevi- 
ation for Knowledge Robots, are active 
components of the DLS. A Knowbot is 
an intelligent program that can exchange 
messages with other Knowbots, move 



Directory of services 





User query 




-* 


Objects 


— 









User 
interface 



University 
computer science 
technical reports . 



SOSP » Symposium on Operating Systems Principles 
TOCSs Transaction on Computer Systems 



Databases 




I Gateway 11 ^— ^ prc^ ) 



Gateway 2 



I ■■ j( Tandem ^ 
*\ products J 



Gateway 3 jQCS ^) 



|Gateway4K - ^ ^SP ) 



Gateway 5 



1* J ucbi ^ 

\ ' l technical reports J 



| Gateway 7 h ^technic^ports) 



Gateway 6 



1 4 ; » | Imperial College ] 
J T " ^technical reports^ 



Figure 11. Indie operation. 
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and replicate itself around the system, 
and search and manipulate objects. The 
Knowbot Information System (KIS) is 
an Internet directory service that un- 
derstands a number of other directory 
services, such as X.500, and queries these 
services on behalf of users. 

Alex. This file system developed by 
Vincent Cate at Carnegie Mellon Uni- 
versity provides users with transparent 
read access to files at anonymous FTP 
sites. Users can see the collection of 
Internet anonymous FTP sites and their 
corresponding directories and files as a 
hierarchical file system. Intermediate 
nodes are Internet domains, hosts, or 
directories within hosts, and leaves are 
files. Using standard filesystem com- 
mands, users can browse through this 
hierarchy to retrieve files. To obtain 
reasonable performance, Alex caches 
information such as machine names, 
directory information, and the contents 
of remote files. It also implements a soft 
consistency mechanism that guarantees 
that updates include all that occurred in 
the last 95 percent of the reported age of 
the file. Alex is currently implemented 
as an NFS server and already integrates 
access to Archie. A WAIS database that 
indexes Readme files scattered through- 



out the Alex space serves as a substitute 
for Archie's whatis database. Cate has 
also built a WAIS database of computer 
science technical, reports available 
through Alex. 

Semantic File Systems. Developed by 
the Programming Systems Group at the 
MIT Laboratory for Computer Science, 
Semantic File Systems integrate 'asso- 
ciative access injo a . traditional tree- 
structured file system. Associative ac- 
cess is achieved by providing file systems 
with an attribute extraction and query 
interface. Filters called transducers ex- 
tract attributes. A transducer takes as 
input the contents of a file or a directory 
and produces as output the identifiable 
objects , and their corresponding at- 
tributes. An object can correspond to ■■ 
an entire directory, a file, or portions of 
a file — such as procedures in a source 
code file — or to individual messages in 
an e-mail file. Queries consist of Bool- 
ean combinations of the desired at- 
tribute-value pairs. Transducers and 
queries produce customized views of 
the file-system hierarchy called virtual 
directories that help locate and orga- 
nize information. A semantic file-sys- 
tem research prototype has been imple- 
mented on top of Sun NFS. 



Nomendator. Developed by Joann 
Ordille and Barton Miller at the Uni- 
versity of Wisconsin at Madison, No- 
mendator implements attribute-based 
naming on top of other naming systems. 
Its access functions are, in essence, serv- 
ers that periodically traverse the appro- 
priate portions of the underlying name 
space and other access functions. They 
build indexes of the objects encoun- 
tered that satisfy certain properties. The 
Nomenclator client uses a directory of 
services called the active catalog to 
identify access functions pertinent to 
a user query. Ordille and Miller have 
built a Nomenclator prototype that 
uses X.500 fes its' underlying informa- 
tion repository. . 

; ^ A taxonomy of resource 
discovery services 

We summarize the surveyed tools by 
presenting a taxonomy of approaches 
to the resource discovery problem. 
Schwartz et al. present another taxono- 
my.* 

Table 1 lists the features that we used 
to classify the surveyed discovery 
tools. " 



Table 1. A taxonomy for Internet Resource Discovery Services. 



Service 


Query 


Browse 


Organize' 


Granularity* v • 


Information' Space 
Organization 


Distribution 


: Directory 
of Services 


Preferred 
Interfaces 


Prospero 




/ 


✓ 


Files 


DG 


Distributed 




File system 


Gopher 




/ 




Files 


DG 


Distributed 




Curses-based 


www 




/ 


✓ 


Files and 
portions of files 


DG 


Distributed 




Hypertext-like 


Semantic File 
System 


✓ 


/ 


✓ 


Portions 
of files . 


DG 








X.500 


/ 




/ 


Object- 
descriptors 


DG 


Distributed' 






Alex 




/ 


/ 




* do..;-;:.'- ^ 

■■:.*■ 


Distributed 




NFS 


Archie 


/ 






•" File names >**. • ■ 


Indexes i ! 


Replicated- 

i-fK'y..*.:- 




Xarchie 

(via Prospero) 


WAIS 


/ 






Objects and 
■ : < Object descriptors 


\ Indexes ' , 


Distributed 




Xwais 


Netfind 


✓ 






; . Information x . 
'about users 


Indexes : 


Replicated. 




telnet 


Indie 


✓ 




✓ 


^Objects and' - 
object descriptors 


T Indexes 


Distributed 




Curses-based 


Nomenclator 


/ 






Name server 
. objects 


Indexes 


Distributed 
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Browsing vs. indexing. Services like 
Gopher and WWW provide users with a 
browsing interface so that they can nav- 
igate the available information space. 
WWW is also an organizational tool. 
Like Prospero and traditional file sys- 
tems, WWW organizes its information 
space by using links. Users can custom* 
ize their home cover page to link to 
interesting information anywhere in the 
WWW information space. However, 
users can customize only their starting 
point in the WWW space, having to 
follow existing links from then on. Be- 
cause it is file-system oriented, Prospe- 
ro is a more flexible organizational tool. 
It lets users customize their entire infor- 
mation space by using Prospero links 
and filters. Nevertheless, unlike WWW 
and hypertext systems in general, in 
which any node can link to any other 
node, Prospero can link a directory node 
only to another directory node or to a 
file node. 

When responding to user queries, ser- 
vices such as Archie, WAIS, Netfind, 
and Indie search their indexing data- 
bases for relevant information. These 
tools build their indexing databases from 



information distributed throughout the 
Internet: Because it is built atop the 
Distributed Hypertext scheme, Indie 
also has the potential for allowing users 
to organize their information space into 
a distributed hypertext. 

Granularity. The granularity with 
which discovery tools manipulate ob- 
jects isjagother distinguishing, feature. 
In Archie, for example, target objects 
are filenames instead of file contents. 
Therefore, Archie can be used only to 
locate information stored in files, that 
have meaningful names. WAlS indexes 
the contents ofdocum^rru, so users can 
find, interesting documents by submit- 
ting keyword-based querje$ 2 In Seman- 
tic Pile Systems, u'se.rs can access self- 
contained i "portions c^^ifile, such as 
procedures in a source code file or indi- 
vidual messages in a mail file. 

Organizing the information space. 

Discovery tools organize searchable data 
into some kind of information space. 
Usually, browsing tools organize their 
information space as a directed graph, 
with nodes connected by links. Prospero, 



Directory for the directory 

The following information lets users access Internet resource discovery 
services or find meir software packages. 

Alex. Service available via Sun-NFS "mount So timeo*30, retrans*300, : 
soft, intr alex.sp.cs.cmu.edui/ )aJex". 

Archie. Service available via Telnet to host ^ Ardhle.mcgill.ca; log In as 
Archie. * 

Gopher. Client and server software available via anonymous FTP from 
host boombox.mlcro.umn.edu under directory :pub/gopher.. 

Indie. Software available via anonymous FTP from host jericoiusc.edu un- 
der directory pub/lndie/lndie.tar.1 .2.2. 

KIS. Service available via Telnet to host nri.reston.va.us on port 185. 

Netfind. Service available via Telnet to r>ost brtirto.cs.colorado.edu; login 
as netfind. , . ; 

Prospero. Client and server software available via anonymous FTP from 
host prospero. isi.edu under directory puh/prospero.. . 

WAIS (Wide Area Information Servers). Client and server software avail- 
able via anonymous FTP from host think.com under directory wais. 

WWW (World-Wide Web). Client and server software available via anony- 
mous FTP from host info.cern.ch under directory pub/WWW. 



Gopher, WWW, and Alex belong to 
this category. WWW is a step toward 
hypertext systems. It allows links be- 
tween nodes of any kind. On the other 
hand, indexing services, such as Archie, 
Netfind, and Indie; tend to organize 
searchable information into indexing 
databases, which allows efficient exhaus- 
tive seijrchfs v - 

4 ;'; Data dfstruSution. In addition to the 
'i subject of how discovery tools organize 
data, there is also the question of where 
this'tfata is stored. In services that cm- 
' ploy a graph-based organization, data is 
usually distributed among geographi- 
cally dispersed servers. Prospero, Go- 
pher, and WWW belong to this group of 
services. On the other end of the design 
^ spectrum, tools like Archie and Netfind 
*rj'uild centralized and replicated index- 
ing databases. Indie and WAIS build 
distributed indexing databases. WAIS 
servers store both the indexing data- 
base- and ; the corresponding data. In- 
die's indexing databases are distributed 
among Indie brokers according to their 
topic of specialization. 

Directories of services. Indie, WAIS, 
and Nomenclator have implemented 
directories of services. Discovery ses- 
sions can start with a query to the direc- 
tory of services, which provides users 
with hints on places to search. The WAIS 
directory, of services knows about all 
participating r WAJS servers and, when 
responding to a user's query, provides a 
list of relevant servers. Similarly, In- 
die'sdirectory responds to user queries 
With a list of relevant Indie brokers. For 
scalability and availability purposes. 
Indie's directory implements Indie's 
replication mechanism. 

The accompanying sidebar shows how 
to access Internet resource discovery 
•services or where to find their software 
packages. Figure 12 shows how to get 
started with resource discovery by us- 
ing Archie to find out about Gopher 
,^istrji>utions. 



Internet resource discovery servic- 
es have proliferated because of the 
continually growing number of 
hosts on the Internet and the corre- 
sponding increase in the amount of avail- 
able information. Discovery tools can 
be classified as browsing or indexing 
tools. Browsing tools organize their in- 
formation space as a directed graph and 
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allow users to find data of interest white searchable information into indexing developed independently. However, the 

navigating the information space. Go- databases and respond to user queries current trend is interoperability, where- 

Dlier FrOSperO, and WWW fall in tills by searching their databases for rele- by users of one discovery service can 

category. Indexing services, such as vant information. ; access, information available through 

Archie, Netfind, and Indie, organize The first Internet discovery tools were other services. For instance, Gopher cli- 



I Contact an Archie server via Telnet as shown in the sidebar "Directory for directories." Note that in the Archie section we showed a sample 
session using the Xarchic interface. Some of the Archie servers in the US are Archie.rutgers.edu. Archie.unl.edu, and Archie.ans.net. 

% telnet Archie.nttgers.edu ...... 

Trying 128.6.18.15 .. . ,„■"''■' /\ "* ■ 

Connectedtodornr.Rutgers.EDU - - '"' u " 

login: Archie 



ARCHIE: Rutgers University Archive Server [November 20 19921 
Archie> * - 

2. Ouery Archie for Gopher distributions. 3 . . 
Archie > prog gopher 

3. Among other hits, Archie provides the following information 

Host boombox.micro.umn.edu (134.84.132.2) . 
Last updated 00:15 23 Dec 1992 

Location: /pub/slipdial 

DIRECTORY rwxrwxr-x 512 Nov 23 13:36 gopher 
Location: /pub/pc *s 

DIRECTORY rwxr-xr-x 512 Oct 29 01:21 gopher . v i , 

Location: /pub 

DIRECTORY rwxr-xr-x 512 Nov 1822:27 gopher 

' . . 

4. To retrieve the distribution, use anonymous FTP. Noteihat software distribution* are commonly named as <tool name> 
<distribution number>.tar.Z. 

% ftp boombox.micro.umn.edu 
Connectedtoboombox.micro.umn.edu. 

220 boombox FTP server (Version 4.1 Tue Apr 10.05:1532 PDT 1990) ready. 
Name (boombox.micro.umn.edu:kobraczk): anonymous 

331 Guest login ok, send ident as password. ; , 

Password: <cnter your user id as the passwords 

230 Guest login ok, access restrictions apply. : '.a 

ftp> cd pub/gopher/Unix 
ftp> get gopherl.l.tar.Z 
ftp> quit 

5. To install Gopher, proceed as follows. 

• Create a directory where you want to install gopher % mkdir gopher 

• Move the distribution to the newly created directory ^uncompress it. and restore its directory structure using the tar command. 

% mv gopherl.l.tar.Z gopher/. 

% cd gopher ■;.»■■;■.- . 

% uncompress gopherl.l.tar.Z 

% tar-xf gopherl.l.tar 

• Go to the client subdirectory. 

% cd client verbatim 

• Make the appropriate environment changes. Essentially, you should edit the Makefile and choose the appropriate machine definition. For 
instance, if you are installing Gopher on a. Sum ybo choos* uncorr.nr.cn t MACHDEFS * -DIS_A_SUN. 

• Make, install, and run Gopher. 

% make - . ? -u : ■ - ■ ■ V s **' 

% make install 
% gopher 



Figure 12. How to get started with resource discovery. 
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ents can retrieve objects from Archie. 
WAIS, and FTP servers. WWW clients 
understand FTP archives. Gopher serv- 
ers, and Archie. Efforts to enhance re- 
source discovery interoperability include 
the standardization of object identifiers. 

Besides enhanced interoperability, fu- 
ture directions in the resource discov- 
ery arena include answering questions 
that range from research-oriented prob- 
lems — such as how to index informa- 
tion using nontextual indexes, for ex- 
ample, indexing pictures by description 
and shape — to commercially oriented 
questions, such as how to bill for data 
access. ■ 
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